Skip to content

Conversation

r-devulap
Copy link
Member

@r-devulap r-devulap commented Nov 15, 2023

Benchmarks sorting an array of 2D and 3D cartesian coordinates:

Benchmark                                Time             CPU   Iterations                                                                                          [35/27249]
--------------------------------------------------------------------------
scalarobjsort<Point2D>/1000          54107 ns        54100 ns        12908
scalarobjsort<Point2D>/10000       1172135 ns      1172105 ns          597
scalarobjsort<Point2D>/100000     14652163 ns     14651279 ns           48
scalarobjsort<Point2D>/1000000   174384347 ns    174363797 ns            4
scalarobjsort<Point2D>/10000000 2042991245 ns   2042818194 ns            1
scalarobjsort<Point3D>/1000          54005 ns        53998 ns        12886
scalarobjsort<Point3D>/10000       1240230 ns      1240178 ns          564
scalarobjsort<Point3D>/100000     15452639 ns     15451391 ns           45
scalarobjsort<Point3D>/1000000   188752147 ns    188723385 ns            4
scalarobjsort<Point3D>/10000000 2182026309 ns   2181807823 ns            1
RUNNING: ./benchexe --benchmark_filter=simdobjsort.* --benchmark_out=/tmp/tmpw4_wn2mr
2023-11-15T12:55:24-08:00
Running ./benchexe
Run on (20 X 1258 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x10)
  L1 Instruction 32 KiB (x10)
  L2 Unified 1024 KiB (x10)
  L3 Unified 14080 KiB (x1)
Load Average: 0.72, 0.25, 0.17
------------------------------------------------------------------------
Benchmark                              Time             CPU   Iterations
------------------------------------------------------------------------
simdobjsort<Point2D>/1000          14763 ns        14775 ns        47513
simdobjsort<Point2D>/10000        199342 ns       199357 ns         3474
simdobjsort<Point2D>/100000      3887458 ns      3887530 ns          181
simdobjsort<Point2D>/1000000    97432099 ns     97428446 ns            6
simdobjsort<Point2D>/10000000 2243062153 ns   2242956708 ns            1
simdobjsort<Point3D>/1000          16229 ns        16238 ns        43099
simdobjsort<Point3D>/10000        213301 ns       213314 ns         3274
simdobjsort<Point3D>/100000      4175434 ns      4175449 ns          168
simdobjsort<Point3D>/1000000   105489606 ns    105483514 ns            6
simdobjsort<Point3D>/10000000 2305455681 ns   2305295989 ns            1
Comparing scalarobjsort.* to simdobjsort.* (from ./benchexe)
Benchmark                                             Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------
[scalarobjsort.* vs. simdobjsort.*]                -0.7272         -0.7269         54107         14763         54100         14775
[scalarobjsort.* vs. simdobjsort.*]                -0.8299         -0.8299       1172135        199342       1172105        199357
[scalarobjsort.* vs. simdobjsort.*]                -0.7347         -0.7347      14652163       3887458      14651279       3887530
[scalarobjsort.* vs. simdobjsort.*]                -0.4413         -0.4412     174384347      97432099     174363797      97428446
[scalarobjsort.* vs. simdobjsort.*]                +0.0979         +0.0980    2042991245    2243062153    2042818194    2242956708
[scalarobjsort.* vs. simdobjsort.*]                -0.6995         -0.6993         54005         16229         53998         16238
[scalarobjsort.* vs. simdobjsort.*]                -0.8280         -0.8280       1240230        213301       1240178        213314
[scalarobjsort.* vs. simdobjsort.*]                -0.7298         -0.7298      15452639       4175434      15451391       4175449
[scalarobjsort.* vs. simdobjsort.*]                -0.4411         -0.4411     188752147     105489606     188723385     105483514
[scalarobjsort.* vs. simdobjsort.*]                +0.0566         +0.0566    2182026309    2305455681    2181807823    2305295989
[scalarobjsort.* vs. simdobjsort.*]_pvalue          0.6776          0.6776      U Test, Repetitions: 10 vs 10
OVERALL_GEOMEAN                                    -0.6203         -0.6202             0             0             0             0


#define UNUSED(x) (void)(x)

template <typename T>
XSS_HIDE_SYMBOL void permute_array_in_place(T *A, std::vector<size_t> P)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to an x86simdsort::detail namespace.

Take the P parameter by const-reference or change the call site to use std::move(arg).

Comment on lines +58 to +50
using return_type_of =
typename decltype(std::function {key_func})::result_type;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a C++17 technique (CTAD). If you need to support pre-C++17, you may need to rewrite it.

{
using return_type_of =
typename decltype(std::function {key_func})::result_type;
std::vector<return_type_of> keys;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add: keys.reserve(arrsize).


// sort an object
template <typename T, typename F>
XSS_EXPORT_SYMBOL void object_qsort(T *arr, size_t arrsize, const F key_func)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be more idiomatic in C++ if you did:

template <typename It, typename F> void object_qsort(It begin, It end, F &&key_func)
{
    using T = typename std::iterator_traits<It>::value_type;
#if __cplusplus >= 201703L
    using R = std::invoke_result_t<F, T>;
#else
    using R = std::result_of_t<F>;
#endif
    std::vector<R> keys;
    keys.reserve(std::distance(begin, end));
    for (auto it = first; it != end; ++it)
        keys.emplace_back(key_func(*it));

@r-devulap r-devulap merged commit d9c9737 into numpy:main Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants